Big data driven perovskite solar cell stability analysis

During the last decade lead halide perovskites have shown great potential for photovoltaic applications. However, the stability of perovskite solar cells still restricts commercialization, and lack of properly implemented unified stability testing and disseminating standards makes it difficult to compare historical stability data for evaluating promising routes towards better device stability. Here, we propose a single indicator to describe device stability that normalizes the stability results with respect to different environmental stress conditions which enables a direct comparison of different stability results. Based on this indicator and an open dataset of heterogeneous stability data of over 7000 devices, we have conducted a statistical analysis to assess the effect of different stability improvement strategies. This provides important insights for achieving more stable perovskite solar cells and we also provide suggestions for future directions in the perovskite solar cell field based on big data utilization.

As the authors themselves admit, perovskite degradation is a complex process where different stressors interact with each other. For instance, in Sn-based perovskites, it is known that the presence of humidity and oxygen sets up a vicious cycle of oxidation that kills the luminescence efficiency. Similar co-dependencies have been reported for most other perovskites. For this reason, a simplistic model that treats stressors as separate, and ignores oxygen and biasing conditions, is not satisfactory. For this metric to be convincing, either: 1. Gather experimental data to show that the effects of different stressors don't influence one other in perovskite stability testing. 2. (Easier) Look into your dataset to see if you have cells for which humidity, illumination and temperature have been reported, and use them to demonstrate the validity of equation 5.
In my opinion, the above issue is a major one and prevents the model from being used as more than a crude metric. Even as a crude metric, it is not very convincing. In fig 2c, majority of the cells are said to have a Ts80m of many hundred hours. In practice, very few cells have such performance today. Thus Ts80m largely overestimates stability, probably because it fails to consider how stressors reinforce each other.
The range of values that gamma and activation energy can take is likely to introduce some uncertainty in the calculations. The authors should report what this uncertainty is.
Furthermore, there is nothing new that the analysis reveals. Almost everything is well-known: allinorganics are more stable than hybrid perovskites, n-i-p is more stable than p-i-n historically (now being overturned), 2D/3D heterostructures more stable than 3D... The idea that the tolerance factor strongly determines stability is perhaps shown here for the first time.
The authors have provided all necessary details for the paper to be reproduced.
The stability model developed by the authors is too simplistic to account for the complex nature of perovskite degradation where stressors reinforce each other. Furthermore, the analysis validates what is already known in the field, and does not offer anything new. Since these are major issues with the paper, perhaps difficult to correct without major re-writing, I suggest that it be submitted to a different journal.

Reviewer #3 (Remarks to the Author):
This manuscript proposes big data driven perovskite solar cell stability analysis. The topic is interesting, and certainly consistent with the contents to be proposed to the readers of "Nature Communications". Moreover, the manuscript is well written and can be read with pleasure: this represents an important aspect in the current scenario of publications in international journals. Overall, I think that this manuscript has to be accepted, but the Authors should take into account the following minor revisions (in terms of bibliographic updates, grammar corrections and content deepening): -It is difficult to review a manuscript without page numbers!!! -Detailed revisions: I spent several hours reading this manuscript, and Authors are asked to follow carefully the attached PDF file where I highlighted some points to be addressed. The attached file also contains language mistakes and typos; some questions related to manuscript contents could also be present and Authors must consider them properly before submitting the revised manuscript. A point-by-point reply is required when the revised files are submitted. -The Introduction should give a wider overview on the present scenario related to hybrid photovoltaics, both in terms of recently published reviews and research articles. In particular, emerging sustainable, integrated and unconventional PV devices are missing and a paragraph on this topic is highly suggested to be added in the Introduction. Authors are invited to go through the literature published in the last six months on these issues, and also on concepts developed some years ago in this field. Some of them are also mentioned in the above mentioned PDF file.
-Authors should provide a clear explanation on the experimental error of the proposed research work. In particular, reproducibility of the phenomena described in the manuscript should be clearly stated in the "Results and Discussion" section; besides, some notes in the "Materials and Methods" section should be added highlighting which kind of experimental approach has been followed to check the reproducibility of the proposed system, the latter being of noteworthy importance in the present research field.
In their review of the first version of this manuscript, reviewer #3 added some comments to the manuscript file. These comments were forwarded to the authors, who replied as included in this Peer Review File.

Point-by-point reply to reviewer comments
We thank all the reviewers for their valuable comments, which we have used to improve our work. Please find below the point-by-point reply to the comments, with the reply in blue color. The revisions made in the revised manuscript are highlighted in yellow color.

Reviewer 1:
In this paper, the authors proposed a single indicator TS80m for stability evaluation of PSCs under various testing conditions and analyzed the collected stability data in the Perovskite Database Project to assess the influence of different perovskite compositions and device configurations on PSC stability. It is interesting to draw conclusions with all available historic data rather than experimental results with specific conditions, and the analysis method can attract many researchers in the same field. It would be a valuable contribution to the field. I recommend it for publication after minor revisions. Please find below the specific comments and suggestions.
We thank the reviewer for the positive comments.

Question 1:
In section 2.2 paragraph 2, the authors said that 1 h of stability at 85°C, 85% RH, and 1 sun illumination corresponds to 184 h of stability in dark and dry conditions at room temperature. According to the formulas and tables given by the authors, there may be some miscalculations. The authors should check them.
We thank the reviewer for checking our work in detail and pointing out this.
The value is calculated by the equation, The reference conditions in the equation are 27 °C, 0% RH and 1 sun illumination, and the value of A is 184, which is correct. We miswrote 1 sun illumination as dark, which we have fixed now.
Revised manuscript: "The heuristics here used assumes that a 1 h of stability at 85 °C, 85% RH, and 1 sun illumination corresponds to 184 h of stability at 1 sun illumination and dry conditions at room temperature, and 1000 hours at those conditions would thus correspond to over 20 years at our chosen standard conditions (i.e., 27 °C, 0% RH and 1 sun illumination)."

Question 2:
In section 4.1 paragraph 2, the authors compared the stability of devices with different HTLs and electrodes. Since inorganic HTLs and carbon electrodes were proved to affect the efficiency of PSCs, a comparison of efficiency and some discussions should be added to verify whether the stability gain balances the stability loss.
Following the reviewer's suggestion, we have compared the efficiencies with different HTLs, where the inorganic HTL devices have a ~4% drop compared to the organic HTL devices, and the carbon-based devices have a ~7% drop overall. We use the product of efficiency and stability gain to compare the total energy output of different devices before the efficiency drops to 80%. For inorganic HTL devices, the loss in efficiency makes them less competitive. However, carbon-based devices still have 2-4 times more energy outputs than organic HTL devices despite the reduced efficiency to a half, and that makes carbon electrode a promising candidate for commercialization. A detailed discussion has been added in the manuscript.
Revised manuscript: "In addition, devices with inorganic HTLs and/or carbon electrodes usually have lower efficiencies, so we also consider the balance between efficiency and stability. We use the product of efficiency gain and stability gain as an indicator to compare the total energy outputs before the efficiency drops below 80%. The results are shown in Supplementary Fig. 11

Question 3:
In section 4.1 paragraph 2, the authors compared the stability of devices with spiro, P3HT, and PTAA. How did the authors deal with multi-HTL devices? If these devices were misclassified, the results would be biased.
We thank the reviewer for the question.

a b c d e f
The multi-HTL devices were included in the statistical samples according to the HTLs they have, and the combined effects of multi-HTLs reduced the difference between samples. In the revised manuscript, we focus on the single-HTL devices, and this reflects the role of HTL materials better. P3HT devices are still the most stable, which is the same as the previous conclusion, and the hypothesis test shows more obvious difference with a TA/TB ratio of ~1.2. A detailed discussion has been added in the revised manuscript.
Revised manuscript: "For devices based on some of the most commonly used organic HTL, including spiro-MeOTAD, P3HT, and PTAA, the analysis shows a 1.2 times stability gain for P3HT (Supplementary Fig. 9 and Supplementary Table 9, SI), and the kernel density estimation shows a peak of more stable devices with P3HT. That means that P3HT is a better choice among the organic HTLs."

Question 4:
In section 4.2 paragraph 2, the authors compared the stability of devices with different ETLs, mainly SnO2 and TiO2. Though there are statistical differences in the results, solid conclusions are hard to reach because of the small ta/tb ratio and deformed distribution curves. Since the same ETL material with different structures and deposition methods can lead to different stability, the authors could give a more detailed discussion accordingly.
We thank the reviewer for the suggestion.
We have regrouped the data into TiO2-c, TiO2-c/TiO2-mp, SnO2-c, SnO2-np and other ETLs (c for compact, mp for mesoporous, np for nanoparticle). The results show that devices based on TiO2-c, SnO2-c and other ETLs have no difference, while SnO2-np devices have better stability than TiO2-c/TiO2-mp. However, for those highest stability devices, TiO2 is more likely to be chosen. For TiO2 with different deposition procedures, chemical bath deposited TiO2-c layer and TiO2-c/TiO2-mp layers based on spray-pyrolysis/ spin-coating have obvious stability improvement than spin-coated TiO2-c layers. A detailed discussion has also been added in the manuscript.  Supplementary Fig. 15 The normal probability plots of log(TS80m) values for devices with different ETLs without encapsulation. a, TiO2-c by spin-coating. b, TiO2-c by spray-pyrolysis. c, TiO2-c by CBD. d, TiO2-c/ TiO2-mp by spin-coating/ spin-coating. e, TiO2-c/ TiO2-mp by spray-pyrolysis/ spin-coating. Question 5:

Supplementary
Though the authors used all available historic data to perform statistical analysis, some groups still have small data sizes and large variances. How do these affect the statistical results and the reliability of the conclusions? The authors should add a discussion in the main text.
We thank the reviewer for the suggestion.
According to the hypothesis test method, small data sizes and large variances will lead to unaccepted hypotheses. Nevertheless, the strategies which show obvious stability improvement are still credible. We have added a discussion in the manuscript. The main contribution of the paper is the introduction of a new stability metric (Ts80m), which accounts for temperature, humidity and light intensity, and therefore allows measurements made under different conditions to be compared. Using this metric to analyse the dataset, the authors draw several conclusions, the most noteworthy of which is, perhaps, the (fully expected) dependence of stability on the tolerance factor.
We thank the reviewer for the comments, which help improve the quality of our manuscript.

Question 1:
As the authors themselves admit, perovskite degradation is a complex process where different stressors interact with each other. For instance, in Sn-based perovskites, it is known that the presence of humidity and oxygen sets up a vicious cycle of oxidation that kills the luminescence efficiency. Similar co-dependencies have been reported for most other perovskites. For this reason, a simplistic model that treats stressors as separate, and ignores oxygen and biasing conditions, is not satisfactory. For this metric to be convincing, either: (1) Gather experimental data to show that the effects of different stressors don't influence one other in perovskite stability testing.
(2) (Easier) Look into your dataset to see if you have cells for which humidity, illumination and temperature have been reported, and use them to demonstrate the validity of equation 5.
Though TS80m is a single indicator for rough estimation, it enables a simple and effective assessment of PSC device stability comparison and succeeds to lead to correct and specific conclusions. To further refine the indicator, a more accurate and general mathematical model that contains all the testing parameters and degradation processes is needed. However, most of publications focusing on the degradation processes only investigate the degradation pathways and products. For the works with quantitative results of degradation rates, they only focus on specific compositions and limited testing conditions. Overall, there is still no unified model that cover all testing conditions and device types so far, and that is exactly what we are going to investigate next. But for now, TS80m is the most effective indicator we can get based on existing works in the field. Moreover, there are lots of work to do with the PSC stability standard and statistical analysis and we aim to provide a feasible example to induce more contributions in this topic.
We have also tried to look into the dataset to find if there are some groups of data that can validate the co-dependencies. To do this, the data in one group need to meet some requirements.
(1) The data come from the same publication. Due to the variance between different laboratories, material suppliers, experimental conditions, instruments, and some other hidden variables, devices usually have different performances even if all reported parameters (compositions, device structures, preparation processes, testing conditions, etc.) are the same. If we choose data from the same publication, the differences can be eliminated as much as possible.
(2) The same composition, device structure and preparation process are used. As widely reported, the device performance is influenced by these parameters.
(3) The data contain specific combinations of testing conditions. For example, if we consider the co-dependencies between temperature (T) and humidity (H), a valid group of data should contain results under at least four environmental conditions, (Tlow, Hlow), (Thigh, Hlow), (Tlow, Hhigh) and (Thigh, Hhigh). Then we can compare the co-effect of the temperature and humidity and their separate effects. More data are needed if light illumination is also considered.
Following the reviewer's requirement, we screened and checked all data from the database, but no such group of data are available because the current way for reporting device stability is brief and lack of standard. We found that most of the publications provide stability results under only one or two environmental conditions, which is what lots of current works do, and those with more than four stability results have gradient changes on only one environmental stressor or choose testing conditions randomly. That limits the validation of the co-dependency.
We have also tried to validate the co-dependency through the distribution of all data and choose temperature and humidity as an example. According to the Arrhenius model, the device performance decay rate, k, is a function of temperature, T.

W & V 0 =L T P S N
As the time to failure is inversely proportional to the degradation rate, TS80 is described as, Ea is the effective activation energy of the degradation process, kB is the Boltzmann constant, and A, B, C are constants. As the equation shows, log(TS80) is linear dependent with 1/T, and the slope of the line depends on the effective activation energy, which represents the sensitivity to temperature and is usually constant in a specific process. If the co-dependency between temperature and humidity is involved, which means the co-effect of the high temperature and high humidity deviates greatly (larger or smaller) from the simple product of separate effects of stressors, the slope of the log(TS80) versus 1/T line will have a definite trend with the increase of humidity.
The log(TS80) versus 1000/T plots of devices at different humidity are shown in Figure  R1 to R3. As the figures show, the slope of MAPbI3 devices slightly increases from 2.2 to 2.7 below 60% RH and then drops a lot to -0.5. The negative slope at very large humidity may result from error caused by the small data size and selection bias. Devices with poor stability tend to be tested at lower testing temperature, while some stable devices tend to be tested under high environmental stresses (e.g., 85 °C, 85% RH), and this results in lots of high-lifetime datapoints in the double-85 area. For FAPbI 3 -based and all inorganic devices, no definite trend is observed, although negative slopes are also obtained at high humidity. Thus, specific mathematical relationships cannot be derived from the dataset. Moreover, most of the datapoints are far from the fitted lines, which means even if co-dependencies are considered in the indicator TS80m, it may not lead to more precise conclusions, and more unknown parameters and uncertainty will be introduced instead.   Our idea of TS80m comes from the accelerated degradation tests where degradation tests of hundreds of hours under harsh conditions are used to predict tens of years of the lifetime of devices. In such cases, the predicted lifetime is usually much larger than the test time. In this work, we use TS80m, which estimates the lifetime under the reference conditions (27 °C, 0% RH and 1 sun illumination), as the indicator to uniformly assess device stability under many different testing conditions. Because a device will have a longer lifetime under milder conditions, TS80m will have larger values than common testing results due to the milder reference conditions compared to the actual testing and working conditions. That is only conversion instead of overestimation.
In fact, we can choose the reference conditions freely. If we choose 85 °C, 85% RH and 1 sun illumination as the reference conditions, TS80m values become much smaller, but that has no influence on the hypothesis test results. We have also added a detailed discussion and verification in the manuscript.  Supplementary Fig. 21), and the hypothesis test conclusion about the tolerance factor remains the same (Supplementary Table 21)." Supplementary Fig. 21 The distribution of all data with the reference conditions of 85 °C, 85% RH and 1 sun illumination. a, Histogram of log(TS80m) values for all data. b, The kernel density estimation of the log(TS80m) values for different tolerance factor a b regions of 3D perovskite devices without encapsulation. Question 3:

Supplementary
The range of values that gamma and activation energy can take is likely to introduce some uncertainty in the calculations. The authors should report what this uncertainty is.
The 6 and Ea appear in the equations of acceleration factors of humidity and temperature respectively as parameters and have optional ranges according to some previous research in the field. In practice, the value of 6 represents the sensitivity of the device lifetime to humidity, and Ea is that to temperature, (e.g., the acceleration factor of temperature will become larger under the same environmental conditions with a larger Ea value chosen). We have tried to change the values of Ea and 6 to see their influence on the results. The value of Ea only influences the TA/TB ratios but does not change the conclusion even if some extreme values are taken, while 6 has no influence on the hypothesis test results because of the linear relationship. We have added the results and the discussion in the manuscript. it demonstrates those well-known intuitions hold even if all available data is considered. The stability improvement strategies are all based on lessons learned from lots of controlled experiments. We are not repeating specific experiments to give results which is consistent with existing intuitions, but turning the well-known intuitions into definite and reliable conclusions with statistical analysis methods.
Moreover, besides qualitative conclusions, the statistical method also gives quantitative comparisons between the stability improvement capabilities of different strategies. For example, carbon-based devices have a 7 times longer lifetime but that of encapsulated devices is only 2.5. The macro assessment cannot be obtained from a single experiment, and the recorded values in the publications which usually focus on single strategies are not always applicable in all cases. That makes it difficult to choose a commercialization strategy combination, which we are trying to solve.
We also give some suggestions for choosing stable device structures according to our statistical results. Interestingly, there is still no device containing all those suggested options reported in the Perovskite Database, which makes an obvious suggestion for further experimental studies.

Reviewer 3:
This manuscript proposes big data driven perovskite solar cell stability analysis. The topic is interesting, and certainly consistent with the contents to be proposed to the readers of "Nature Communications". Moreover, the manuscript is well written and can be read with pleasure: this represents an important aspect in the current scenario of publications in international journals. Overall, I think that this manuscript has to be accepted, but the Authors should take into account the following minor revisions (in terms of bibliographic updates, grammar corrections and content deepening) We thank the reviewer for the positive comments.

Question 1:
It is difficult to review a manuscript without page numbers!!!
We thank the reviewer for the suggestion. We have changed the layout and added page numbers in the manuscript.

Question 2:
Detailed revisions: I spent several hours reading this manuscript, and Authors are asked to follow carefully the attached PDF file where I highlighted some points to be addressed. The attached file also contains language mistakes and typos; some questions related to manuscript contents could also be present and Authors must consider them properly before submitting the revised manuscript.
We thank the reviewer for the help with the details. We have fixed all the errors marked in the PDF file.

Question 3:
The Introduction should give a wider overview on the present scenario related to hybrid photovoltaics, both in terms of recently published reviews and research articles. In particular, emerging sustainable, integrated and unconventional PV devices are missing and a paragraph on this topic is highly suggested to be added in the Introduction. Authors are invited to go through the literature published in the last six months on these issues, and also on concepts developed some years ago in this field. Some of them are also mentioned in the above mentioned PDF file.
We thank the reviewer for the suggestions. We have added some revisions in the main text.
Revised manuscript: "Together with the advantages of low cost and simple preparation as previously reported 52 , carbon electrodes will be a promising candidate for commercialization." "Encapsulation has proved to be a simple and effective strategy to improve the external stability of PSCs by preventing the penetration of moisture and oxygen [55][56][57] and to prevent lead leakage 58 , which is a necessary part of commercialization."

Question 4:
Authors should provide a clear explanation on the experimental error of the proposed research work. In particular, reproducibility of the phenomena described in the manuscript should be clearly stated in the "Results and Discussion" section; besides, some notes in the "Materials and Methods" section should be added highlighting which kind of experimental approach has been followed to check the reproducibility of the proposed system, the latter being of noteworthy importance in the present research field.
We thank the reviewer for the suggestions. We have discussed the uncertainty and reproducibility from four perspectives, namely the influence of the ranges of the parameters (E a and 6), the influence of the reference conditions, the reliability of the hypothesis test method and the data from the Perovskite Database Project.
According to the results provided in the manuscript and SI, the value of Ea only influences the TA/TB ratios but does not change the conclusion even if some extreme values are taken, while 6 has no influence on the hypothesis test results because of the linear relationship. The reference conditions only change the value of TS80m because TS80m represents the estimated lifetime under the reference conditions, but they have no influence on the distribution of the data and the hypothesis test results. For the hypothesis test method, some unaccepted hypotheses may be caused by small data sizes and large variances, but the strategies which show obvious stability improvement are credible. For the data from Perovskite Database, though only a small number of publications are included, the dataset is sufficient to draw clear and credible conclusions. We have added a discussion of error and reproducibility in the manuscript.

"Discussion on uncertainty and reproducibility
The indicator TS80m is calculated by converting three main environmental stresses, temperature, humidity and light intensity to separate acceleration factors and multiplying them with TS80. Uncertainty will come from the co-dependencies between different stressors, the range of parameters (Ea in Atemperature and in Ahumidity) and the chosen reference condition.
For the range of parameters (Ea and ), the different values will make TS80m more sensitive or less to the environmental stresses. For example, with a larger Ea value, one device will achieve a higher TS80m from Atemperature. Supplementary Table 19 and 20 show that the average of TS80m is positively related to both Ea and . However, only Ea influences the hypothesis test results because of the exponential relationship, while the change of has the same effect on all the devices, which keeps the results the same. Thus, reasonable parameter values are needed for the lifetime estimation, but the hypothesis test is less affected.
In addition, different reference conditions will not affect the conclusions. TS80m predicts the lifetime under the reference conditions (27 °C, 0% RH and 1 sun illumination), which is too mild compared to the actual testing and working conditions, thus the indicator seems to overestimate the device stability. We also choose 85 °C, 85% RH and 1 sun illumination as the reference conditions and recalculate TS80m. The results show that all the data points only shift to smaller values without change in shape ( Supplementary Fig. 21, SI), and the hypothesis test conclusion about the tolerance factor remains the same (Supplementary Table 21).
The detail of the hypothesis test method is described in Supplementary note 2 (SI). An accepted hypothesis, which means there is a statistically significant difference between two samples, requires large sample sizes, small variances and large average differences. Thus, the limitation of data (small data sizes and large variances) tends to give an unaccepted hypothesize, while the strategies which show obvious stability improvement are credible.
As mentioned above, the Perovskite Database contains stability data for 7419 devices with publication data from 2012.08.21 to 2021.05.21 at the time of writing. Note that only a small number of publications are included, but the dataset is sufficient to draw conclusions that are consistent with the current state of the field. However, the research focus of the PSC field changes over time (e.g., the change of mainstream perovskite compositions), so the conclusions are not always true and may be overturned in the future. Time-dependent statistical analysis is needed to draw dynamic conclusions, which is beyond the scope of this work. "

"Materials and Methods
Data are downloaded from the Perovskite Database Project on 2022.01.18.